Using MAP estimation of feature transformation for speaker recognition
نویسندگان
چکیده
We propose to use a new feature transformation (FT) function to construct supervectors of support vector machines for speaker recognition. Considering that estimation of bias vectors is more robust than that of transformation matrices, we define the FT function in a flexible form that transformation matrices and bias vectors are controlled by separate regression classes. Unlike the MLLR-based approach that needs a continuous speech recognition system, our FT function parameters are estimated based on a Gaussian mixture model (GMM). An iterative training procedure is used to achieve the maximum a posteriori estimation of the FT function parameters, which avoids the possible numerical problem caused by insufficient training data in the maximum likelihood estimation. Our approach is evaluated on the SRE2006 NIST evaluation and obtains better performance than a conventional SVM system based on GMM mean supervectors.
منابع مشابه
Speaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation
A variety of methods are used for speaker adaptation in speech recognition. In some techniques, such as MAP estimation, only the models with available training data are updated. Hence, large amounts of training data are required in order to have significant recognition improvements. In some others, such as MLLR, where several general transformations are applied to model clusters, the results ar...
متن کاملSpeaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation
A variety of methods are used for speaker adaptation in speech recognition. In some techniques, such as MAP estimation, only the models with available training data are updated. Hence, large amounts of training data are required in order to have significant recognition improvements. In some others, such as MLLR, where several general transformations are applied to model clusters, the results ar...
متن کاملAcoustic analysis and feature transformation from neutral to whisper for speaker identification within whispered speech audio streams
Whispered speech is an alternative speech production mode from neutral speech, which is used by talkers intentionally in natural conversational scenarios to protect privacy and to avoid certain content from being overheard or made public. Due to the profound differences between whispered and neutral speech in vocal excitation and vocal tract function, the performance of automatic speaker identi...
متن کاملتشخیص لهجه های زبان فارسی از روی سیگنال گفتار با استفاده از روش های استخراج ویژگی کارآمد و ترکیب طبقه بندها
Speech recognition has achieved great improvements recently. However, robustness is still one of the big problems, e.g. performance of recognition fluctuates sharply depending on the speaker, especially when the speaker has strong accent and difference Accents dramatically decrease the accuracy of an ASR system. In this paper we apply three new methods of feature extraction including Spectral C...
متن کاملSpeaker recognition based on discriminative feature extraction - optimization of mel-cepstral features using second-order all-pass warping function
This paper describes a new framework for designing speaker recognition systems based on the discriminative feature extraction (DFE) method. We apply a mel-cepstral estimation technique to the feature extractor in a Gaussian mixture model (GMM)-based text-independent speaker identification system. The mel-cepstral estimation technique uses the second-order all-pass warping function for frequency...
متن کامل